Segmentation Strategies for Streaming Speech Translation

نویسندگان

  • Vivek Kumar Rangarajan Sridhar
  • John Chen
  • Srinivas Bangalore
  • Andrej Ljolje
  • Rathinavelu Chengalvarayan
چکیده

The study presented in this work is a first effort at real-time speech translation of TED talks, a compendium of public talks with different speakers addressing a variety of topics. We address the goal of achieving a system that balances translation accuracy and latency. In order to improve ASR performance for our diverse data set, adaptation techniques such as constrained model adaptation and vocal tract length normalization are found to be useful. In order to improve machine translation (MT) performance, techniques that could be employed in real-time such as monotonic and partial translation retention are found to be of use. We also experiment with inserting text segmenters of various types between ASR and MT in a series of real-time translation experiments. Among other results, our experiments demonstrate that a good segmentation is useful, and a novel conjunction-based segmentation strategy improves translation quality nearly as much as other strategies such as comma-based segmentation. It was also found to be important to synchronize various pipeline components in order to minimize latency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Segmentation Strategies for Simultaneous Speech Translation

In this paper, we propose new algorithms for learning segmentation strategies for simultaneous speech translation. In contrast to previously proposed heuristic methods, our method finds a segmentation that directly maximizes the performance of the machine translation system. We describe two methods based on greedy search and dynamic programming that search for the optimal segmentation strategy....

متن کامل

Incremental Segmentation and Decoding Strategies for Simultaneous Translation

Simultaneous translation is the challenging task of listening to source language speech, and at the same time, producing target language speech. Human interpreters achieve this task routinely and effortlessly, using different strategies in order to minimize the latency in producing target language. Toward modeling the human interpretation process, we propose a novel input segmentation method us...

متن کامل

Incremental Segmentation and Decoding Strategies for Simultaneous Translation

Simultaneous translation is the challenging task of listening to source language speech, and at the same time, producing target language speech. Human interpreters achieve this task routinely and effortlessly, using different strategies in order to minimize the latency in producing target language. Toward modeling the human interpretation process, we propose a novel input segmentation method us...

متن کامل

Automatic sentence segmentation and punctuation prediction for spoken language translation

This paper studies the impact of automatic sentence segmentation and punctuation prediction on the quality of machine translation of automatically recognized speech. We present a novel sentence segmentation method which is specifically tailored to the requirements of machine translation algorithms and is competitive with state-of-the-art approaches for detecting sentence-like units. We also des...

متن کامل

The influence of utterance chunking on machine translation performance

Speech translation systems commonly couple automatic speech recognition (ASR) and machine translation (MT) components. Hereby the automatic segmentation of the ASR output for the subsequent MT is critical for the overall performance. In simultaneous translation systems, which require a continuous output with a low latency, chunking of the ASR output into translatable segments is even more criti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013